feat(perception): monocular obstacle avoidance via optical flow 1775 by jhengyilin · Pull Request #1908 · dimensionalOS/dimos

jhengyilin · 2026-04-23T23:29:28Z

feat(perception): monocular obstacle avoidance via optical flow τ-estimation

Branch: feature/optical-flow-obstacle-avoidance-1775 → main

Summary

Implements OpticalFlowModule — real-time Time-to-Contact (τ) estimation from
sparse optical flow on a single RGB camera. No depth sensor, no calibration, no
GPU. Designed as a complement to LiDAR proximity sensing, not a replacement.
Closes

Answers to issue

"Can optical flow be used for real-time obstacle avoidance?"
Yes, as a complement to LiDAR — not a replacement. It detects fast-approaching
obstacles via τ = 1/divergence with 93.9% precision when it fires, but recall
is 0.295, so it only catches ~30% of danger frames. It is blind to stationary
obstacles. Recommended: pair with LiDAR for static proximity; use optical flow
for dynamic and fast-approaching obstacles.

"How much does optical flow correlate with LiDAR?"
Weakly. Spearman r = +0.173 — correct direction but too noisy for a calibrated
timer. Use as a ranked urgency signal, not a precise TTC.

"What keypoint strategies? What does ORB-SLAM use?"
ORB-SLAM uses Oriented FAST for rotation-invariant BRIEF descriptor matching.
That orientation step is irrelevant for Lucas-Kanade, which tracks image patches
directly and has no orientation input. Plain FAST is the correct choice here.

Problem

The Go2 uses LiDAR for obstacle proximity but cannot answer "how fast is it
approaching?" This adds a camera-based early-warning signal for fast-approaching
or dynamic obstacles, complementing LiDAR rather than replacing it.

What Was Built

dimos/perception/optical_flow/
├── __init__.py
├── tac_estimator.py           ← divergence → τ math 
├── optical_flow_module.py     ← dimos Module: camera in, signals out
└── backends/lucas_kanade.py   ← FAST corners + Lucas-Kanade tracking

scripts/eval_optical_flow_gated.py   ← odom-gated evaluation vs LiDAR
scripts/visualize_optical_flow.py    ← live Rerun viewer (annotated frames + τ plot)
RESULTS.md                           ← theory, methodology, full results

Pipeline at a glance:

flowchart LR
    A["Camera RGB frame"] --> B["FAST keypoint detection"]
    B --> C["Lucas-Kanade tracking"]
    C --> D["Flow vectors (dx, dy)"]
    D --> E["5×5 grid divergence"]
    E --> F["τ = 1 / divergence"]
    F --> G{"τ < threshold?"}
    G -- yes --> H["DANGER signal"]
    G -- no --> I["CLEAR"]
    J["IMU ω (yaw rate)"] --> K{"|ω| > 0.3?"}
    K -- yes --> L["GATED — suppress danger"]
    K -- no --> G

Module Interface

Stream	Direction	Type	Description
`color_image`	In	`Image`	RGB/BGR camera feed
`angular_velocity`	In	`Any`	Yaw rate rad/s — optional
`danger_signal`	Out	`Bool`	True when min τ < threshold AND not turning
`tac_grid`	Out	`Any`	5×5 float array of per-cell τ values
`flow_visualization`	Out	`Image`	Debug overlay: flow vectors + grid

When angular_velocity is connected and |ω| > omega_max (default 0.3 rad/s),
danger_signal is forced False. If unused, _last_omega stays 0 and the gate
is transparent — backward compatible.

Evaluation Results

Latency: ~2.5 ms/frame

τ Correlation with LiDAR TTC (82 odom-gated frames: v_fwd > 0.15 m/s, |ω| < 0.15 rad/s)

Pearson r	Spearman r	RMSE
+0.144	+0.173	3.31

Correct sign — closer obstacle → smaller τ — but magnitude is noisy; use τ as a
ranked urgency signal, not a precise timer.

Binary Detection (GT: lidar_dist < 1.5 m, GT positive rate: 84%)

τ threshold	Precision	Recall	F1
3.0 (default)	0.939	0.295	0.449
5.8 (max F1)	0.918	0.533	0.675

At τ=3.0, 93.9% of alarms are confirmed by LiDAR. Tune tau_threshold per use-case.

Live visualization. scripts/visualize_optical_flow.py replays
unitree_office_walk through the same backend and streams annotated frames,
the tau grid, and a min_tau time-series into a Rerun viewer — scrub any
frame to see which cell fired and why.

Known Limitations

Limitation	Consequence	Mitigation
Stationary obstacles produce no flow	Blind to static walls at constant distance	LiDAR handles static proximity
τ value noisy (RMSE ≈ 3 s)	Not a precise collision timer	Ranked alarm; wire `tac_grid` to planner
Low-texture surfaces yield few keypoints	Reduced grid coverage	Edge-enhancing preprocessing

Files Changed

New — production code: dimos/perception/optical_flow/ (5 files)
New — evaluation: scripts/eval_optical_flow_gated.py, RESULTS.md
New — demo: scripts/visualize_optical_flow.py (Rerun viewer)
No existing files modified.

Monocular Time-to-Contact detection via FAST keypoints + Lucas-Kanade flow. Follows dimos Config pattern; rotation-gated via optional angular_velocity stream. Closes dimensionalOS#1775.

Eval: P=0.939 at τ=3.0 on unitree_office_walk. Visualization streams annotated frames + τ timeline to Rerun.

leshy · 2026-04-24T01:30:17Z

can you elaborate on why a mobile robot cannot avoid stationary obstacles with optical flow?

how can I test this? how did you test this? for example,

"How much does optical flow correlate with LiDAR?"
Weakly. Spearman r = +0.173 — correct direction but too noisy for a calibrated

can you show me how to reproduce this number?

jhengyilin · 2026-04-24T01:56:53Z

Testing

I evaluated the module offline against LiDAR ground truth on the unitree_office_walk dataset (51s indoor sequence). The eval script is at scripts/eval_optical_flow_gated.py — it runs every frame through the optical flow backend, filters for frames where the robot is moving forward (speed > 0.15 m/s) and not turning (|ω| < 0.15 rad/s) to isolate pure translation, then uses the LiDAR distance on those frames as ground truth to test optical flow against. It computes:

Correlation — Spearman r = +0.173 between optical flow τ and LiDAR TTC
Binary detection — Precision = 0.939, Recall = 0.295 at τ=3.0 (we take Ground Truth as LiDAR distance < 1.5m)
Latency — ~2.5 ms/frame

To reproduce, first place the unitree_office_walk dataset folder under data/, then run from repo root:

Evaluation:

PYTHONPATH=.venv/lib/python3.12/site-packages/rerun_sdk \
    uv run python3 scripts/eval_optical_flow_gated.py

Visualization (opens a Rerun viewer where you can scrub the timeline and see flow arrows, τ grid, and DANGER/CLEAR labels frame-by-frame):

PYTHONPATH=.venv/lib/python3.12/site-packages/rerun_sdk \
    uv run python3 scripts/visualize_optical_flow.py

Results land in results/optical_flow_gated.json. Full methodology and numbers are in RESULTS.md.

Stationary obstacles and optical flow

A mobile robot can avoid stationary obstacles with optical flow — as long as two conditions are met:

Relative motion exists — the robot is moving toward the obstacle. The robot's own forward motion produces an expanding flow field in the image (pixels diverge outward from the focus of expansion), which gives a finite τ = 1/divergence. The obstacle doesn't need to move on its own.
The obstacle has trackable texture — FAST needs to detect keypoints on the obstacle's surface (edges, corners, texture). A featureless surface may yield too few keypoints for a reliable divergence estimate in that grid cell.

The only case where optical flow truly can't help is when there is zero relative motion between the camera and the obstacle — e.g., the robot is stationary, or moving parallel to a wall at constant distance. No pixel displacement means zero divergence and no alarm. That's the specific case where LiDAR is still needed for proximity sensing.

leshy · 2026-04-24T02:48:32Z

you need to understand all code written and all decisions made.
you can do visual inspection, you don't need to do fancy LIDAR corelations, should be obvious if optical flow is detecting obstacles that are visible on screen or not
if doing fancy LIDAR corelation - make sure it's actually correct - so visualize expected obstacles on camera image (according to lidar) small timestamp differences between odom and video, the way you take into account camera lens distortions are critical (but this is not neccessary to do at all for the first pass, just if doing this, make sure it's correct, it's better to have no data then wrong data)
do we need this grid at all? I think there should be nicer algo that gives you a gradiant of rate of divergence of keypoints across an image? it's ok if not easy etc, just asking as I assume there is such a thing
keypoint detection algo used is heavy and sparse, you want light (easy to compute) and non-sparse keypoints. idk if this is a diff setting for your existing keypoint algo, or entirely differnet algo

…anger on connected-components

…danger on connected-components Addresses review feedback on dimensionalOS#1775: - FAST corners → uniform meshgrid (light, non-sparse keypoints) - per-point FOE τ → divergence-on-grid τ - min(τ) alarm → connected-components on thresholded divergence - LK reconstruction-error filter on per-point quality - LiDAR-correlation evaluation removed

jhengyilin · 2026-04-26T17:14:32Z

Optical flow obstacle avoidance

τ (time to contact) math

We use divergence (∂u/∂x + ∂v/∂y, computed via np.gradient on the flow field) and τ = 2 / divergence as the backend math to calculate time to contact, this approach is more robust vs the foe (focus of expansion) approach (it will break with movement around foe which gives small flow vectors).

Keypoint detector

We originally conisder FAST keypoint detector, however it is design for SLAM use (which require keypoint and descripter) and relative expensive to compute, futhermore, we need flow at every image location, including the smooth surfaces (for example, wall or some big single texture obstacle) which FAST can hardly find. Uniform grid is content-independent and constant-compute. More suitable for what we want "light + non-sparse keypoints".

Alarm

We originally go for alarming on min(τ) — single noisy cells could trip DANGER. This iteration switched to connected components on the thresholded divergence map; alarm only fires on coherent blobs (~80×80 px minimum).

Pipeline

flowchart TD
    F["Frame N-1 + Frame N"] --> LK["Lucas-Kanade on uniform grid points"]
    LK --> ERR["Drop tracks with high LK matching error"]
    ERR --> UV["raw u, v per surviving point"]

    UV --> GS["Gaussian smooth on flow field (u, v)"]
    GS -- "np.gradient to get du/dx and dv/dy" --> SOB["Per-cell divergence = (du/dx + dv/dy)"]
    SOB -- "tau = 2 / divergence " --> MS["Median smooth on divergence / tau map"]

    UV -. "raw flow" .-> A1["Arrow GEOMETRY: direction and length"]
    MS -. "smoothed tau" .-> A2["Arrow COLOR: red, yellow, green by tau band"]
    A1 --> VIZ["flow_visualization: Out Image"]
    A2 --> VIZ

    MS --> THR["Threshold mask: divergence above limit"]
    THR --> CC["Connected components on mask"]
    CC --> CHK{"Largest blob area at least 15 cells?"}
    CHK -- yes --> DT["danger_signal: Out Bool = TRUE"]
    CHK -- no --> DF["danger_signal: Out Bool = FALSE"]

In the GIFs below:

Arrow direction and length = the raw (u, v) per tracked point — what LK actually returned for that frame, noise included.
Arrow color (red / yellow / green) = the τ band from the smoothed divergence map — reflects the algorithm's per-cell danger reading after Gaussian + median smoothing. Red = τ < 3 frames, yellow = τ < 6, green = safe / not approaching.
DANGER / CLEAR label in the corner = output of the connected-components alarm on the thresholded smoothed divergence map. Alarm only if a coherent low-τ blob exceeds ~15 cells (~80×80 px). Filter out single or small noisy blob cells.

When this module is deployed, downstream DimOS consumers subscribe to:

danger_signal: Out[Bool] — the safety signal.
flow_data: Out[Any] — per-point (x, y, τ, u, v) array for any custom downstream consumer/application.
flow_visualization: Out[Image] — the annotated frame.

Demo

uv run python3 scripts/run_on_video.py [video_source]

Outdoor walk that ends at a dumpster - starts CLEAR as the dumpster is still distant, fires DANGER once it comes into range, briefly drops back to CLEAR as the camera reorients across the parking lot, then fires DANGER again at near-contact when the dumpster fills most of the frame:

Indoor scene — same algorithm, walking around a workbench with multiple obstacles (robotic arm, ArUco fixtures, cup):

Module integration

OpticalFlowModule follow the standard dimos perception module practice (In[Image] for camera, Out[Bool]/Out[Any]/Out[Image] for danger / flow data / annotated vizualize img). Production deployment is via existing dimos stack (blueprint + dimos run).

Running the full framework integration on macOS need extra configuration/integration, and somehow out of scope for this issue task #1775 which is about explore optical flow's feasability to do obstacle avoidance using mono camera video feed. To run the pipeline without extra setup, scripts/run_on_video.py calls the backend directly on video stream — same backend code, no framework needed.

Notes

Flow field is naturally noisy (LK tracking errors, single-frame jitter). Three layers of noise handling: Gaussian smooth on the flow field before differentiating, median smooth on the divergence map after, and per-point LK reconstruction-error threshold drops bad tracks before they enter divergence at all.
Smooth uniform surfaces (the red Solo cup, a plain wood desk) sometimes don't fire even at near-contact. LK needs features to track and smooth surfaces have few.
iPhone is 30fps; downsampled by stride=2 to ~15fps to match go2's fps.
Rotation breaks both LK's brightness-constancy / small-displacement assumptions and the forward-translation assumption behind τ = 2 / divergence. Module exposes angular_velocity: In[Any] and supress danger signal when |ω| > 0.3 rad/s (omega_max).
Optical flow is an assumption-heavy approach (brightness constancy, sufficient texture to track, near-translational motion) — when assumptions break, the output degrades. So the approach is best as a cheap mono-camera second-opinion / fallback signal, not a primary obstacle avoidance guidance — Go2 already has LiDAR for that. Where this approach is useful is for platforms without depth sensors. Some future upgrade thoughs: dense flow (Farneback), learned flow (RAFT), or fusion with IMU / depth.

jhengyilin added 3 commits April 23, 2026 19:06

feat(perception): OpticalFlowModule for τ-based obstacle avoidance

1f49946

Monocular Time-to-Contact detection via FAST keypoints + Lucas-Kanade flow. Follows dimos Config pattern; rotation-gated via optional angular_velocity stream. Closes dimensionalOS#1775.

feat: odom-gated evaluation and Rerun visualization for optical flow

d9bbc0b

Eval: P=0.939 at τ=3.0 on unitree_office_walk. Visualization streams annotated frames + τ timeline to Rerun.

docs: results writeup for dimensionalOS#1775

4e4f9b5

jhengyilin added 2 commits April 26, 2026 12:38

refactor(perception) divergence tau + uniform grid keypoint + alarm d…

a966af5

…anger on connected-components

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(perception): monocular obstacle avoidance via optical flow 1775#1908

feat(perception): monocular obstacle avoidance via optical flow 1775#1908
jhengyilin wants to merge 5 commits intodimensionalOS:devfrom
jhengyilin:feature/optical-flow-obstacle-avoidance-1775

jhengyilin commented Apr 23, 2026 •

edited by leshy

Loading

Uh oh!

leshy commented Apr 24, 2026 •

edited

Loading

Uh oh!

jhengyilin commented Apr 24, 2026

Uh oh!

leshy commented Apr 24, 2026 •

edited

Loading

Uh oh!

jhengyilin commented Apr 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jhengyilin commented Apr 23, 2026 • edited by leshy Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat(perception): monocular obstacle avoidance via optical flow τ-estimation

Summary

Answers to issue

Problem

What Was Built

Module Interface

Evaluation Results

Known Limitations

Files Changed

Uh oh!

leshy commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhengyilin commented Apr 24, 2026

Uh oh!

leshy commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhengyilin commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Optical flow obstacle avoidance

τ (time to contact) math

Keypoint detector

Alarm

Pipeline

Demo

Module integration

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jhengyilin commented Apr 23, 2026 •

edited by leshy

Loading

leshy commented Apr 24, 2026 •

edited

Loading

leshy commented Apr 24, 2026 •

edited

Loading

jhengyilin commented Apr 26, 2026 •

edited

Loading